Vincent Clemson  


Education

The Pennsylvania State University 2017 - B.S in Mathematics | Minor in Statistics


Background

👋Hi I’m Vince. Over the past 2 months, I’ve driven 8,000+ miles traveling across the country 🚗💨
In my blog, I show how to use open source tools to analyze geospatial data. Previously, I worked at Booz Allen where I conducted a study on NATO1 object detectors using commercial satellite imagery on my AI Edge Kit team. Prior to this, I worked at Peraton for 5 years on a GEOINT2 performance modeling team, where I analyzed billions of records of transactional data on geospatial imagery stored by the NGA3 to help optimize the NSG4 and benefit the US military. Here, I promoted open source software development best practices and data science tooling usage in Python & R. Additionally, I managed my team’s government network GitLab & corporate GitHub organizations, which grew from 0 to 300+ repositories over my tenure.


What I’m interested in doing

Saving the world. That’s a pretty tough job, but someone’s gotta do it. Applying my skills in data science & spatial analytics to collaborate with a team that’s working on a small part of this big mission is a dream of mine.


Open Source Projects


Programming Skills

I have hands on expertise in using various Machine Learning, Data Analysis, Geospatial & Visualization Packages, Web Technologies (app frameworks & scraping tools), SSGs5, & Notebook / Computational Medium tools.
Below is a non-exhaustive high level list of the technologies that I’m working in.
Python, Conda/Mamba, Jupyter, nbdev, Sphynx, Cookiecutter, SQL, JavaScript, Bash, Zsh, tmux, VSCode, R, Quarto, R Markdown, GNU Make, asciinema, Leafmap, Google Earth Engine, QGIS, GDAL

Career Path

AI Engineer – Booz Allen Hamilton

  • Conducted a statistical performance analysis for evaluating 3rd party geospatial computer vision algorithms 🛩🏘️
    (e.g. aircraft & building detectors using Maxar satellite imagery, polars, {sf}, {terra}, & Quarto to document results)
  • Built interactive satellite imagery data lakes (e.g. using Leafmap, TiTiler, & STAC6)
  • Built interactive statistical reports for JSOC7 on soldier biometric performance data w/ {flexdashboard}

Systems Engineer - Peraton

Data science on the NGA’s enterprise systems engineering contract

  • Analyzed the metadata & transactions of all Geospatial Intelligence imagery products across the IC
  • Wrangled large amounts of historical categorical, numerical, spatial, & temporal data using extremely efficient in/out of memory tools (e.g. data.table, Apache Arrow, & Parquet file data lakes)
  • Prototyped, developed, & maintained modeling tools to conduct EDA on data to analyze patterns and trends (e.g. ggplot2, sf, Plotly, Matplotlib, Leaflet, Dash, Shiny, Docker, & Cloud Foundry)
  • Performed spatial relational/geometric operations on datasets to enrich feature sets (e.g. border regions)
  • Used reproducible computational mediums to conduct workflows (e.g. R Markdown, Jupyter notebook)
  • Statistical analysis on the performance, sizing, & budgeting of NSG imagery & their driving relationships
    (e.g. linear trend models, bandwidth models, human-in-the-loop supervised/unsupervised EDA ML workflows)
  • Statistical analysis & Orbital Mechanics analyses on the performance of an ABI8 satellite / ground sensor system
  • Worked on a distributed team & operated in a cloud computing environment. Experience with building a cloud from the ground up, config management, & permissions (e.g. AWS, RStudio Server Pro, Unix/Linux, VPC)

Application Developer Intern - JP Morgan Chase

  • Agile development team in JP’s Technology Analyst Program. Team of 6 interns built a full stack Java-Spring tool aggregating data for the planning & execution of the migration & decommissioning of legacy JPMC data center servers. Worked front & backend. Led role as Scrum Master.

Data Analytics Intern - IMG Learfield & Penn State Athletics

Season Ticket Holder Survey Analysis

  • Performed Decision Tree Modelling in R for finding trends between customers and ticket sale renewals
  • Mined customer survey data using NLP9 techniques & the NLTK10 in Python (e.g. tokenizers, collocations)


Machine Learning Skills

Some topics I’ve dived into while integrating machine learning workflows onto NSG transactional log data.
Spatial Cross-Validation Techniques, Discrete Event Simulation, Generalized Linear Models, Ensemble Models, Unsupervised Learning, Principal Components Analysis, Clustering Techniques,
Dimensionality Reduction (Feature Selection)
Other topics I’ve dived into through various ML workings. e.g. rebuilding Deep Learning with PyTorch.
CNNs (Convolutional Neural Networks), GANs (Generative Adversarial Networks), Gradient Descent, Regularization, Decision Boundary, One-vs-All Multiclass Classification, Neural Networks, Vectorization,
Backpropagation and Advanced Optimization techniques

  1. NATO - North Atlantic Treaty Organization ↩︎

  2. GEOINT - Geospatial Intelligence ↩︎

  3. NGA - National Geospatial-Intelligence Agency ↩︎

  4. NSG - National System for Geospatial-Intelligence ↩︎

  5. SSG - Static Site Generator ↩︎

  6. STAC - SpatioTemporal Asset Catalogs ↩︎

  7. JSOC - Joint Special Operations Command ↩︎

  8. ABI - Activity Based Intelligence ↩︎

  9. NLP - Natural Language Processing ↩︎

  10. NLTK - Natural Language Toolkit ↩︎